Final Project

Author

Addie Rhee, Vaila Heinemann, Hannah Dutta, Elizabeth Boland

Introduction

Data Description

This project aims to explore the relationship between economic growth and environmental impact by examining Carbon Dioxide (CO2) emissions per capita in relation to Gross Domestic Product (GDP) per capita. Through careful data cleaning and analysis, we expect to uncover trends and insights that can inform policy decisions and promote sustainable development practices.

By completing this analysis, we will contribute to the understanding of how economic development influences environmental outcomes and provide a data-driven foundation for future research in this area.

Variables

CO2 Emissions Per Capita

The carbon dioxide total based on consumption from GM Long Series in Million Tonnes from 1950 to 2021. The data was pulled from Gapminder. Gapminder stated that they collected the data from the Global Carbon Project, CDAIC, and other sources.

GDP Per Capita

The gross domestic product per capita which “measures the value of everything produced in a country during a year, divided by the number of people.” It is adjusted for “differences in purchasing power (in international dollars, fixed 2017 prices, PPP based on 2017 ICP).” The years we are using are from 1950 to 2021 which was pulled from the from Gapminder. Gapminder stated that they collected the data from multiple different sources.

Hypothesized Relationship Between the Variables

Hypothesis: There is a positive, linear relationship between GDP Per Capita and CO2 Emissions Per Capita. As a country’s economic output increases, its CO2 emissions per capita are likely to rise due to increased industrial activity, energy consumption, and higher standards of living which often result in more energy-intensive lifestyles. However, based off of the research we found, the GDP Per Capita and CO2 Emissions relationship will likely weaken in the future because of environmental policies and climate change concerns.

References:

https://www.sciencedirect.com/science/article/pii/S235248471500013X?via%3Dihub#screen-reader-main-title

https://www.iea.org/commentaries/the-relationship-between-growth-in-gdp-and-co2-has-loosened-it-needs-to-be-cut-completely

Discussion of Data Cleaning Process and Decisions

The data cleaning process involved several critical steps to ensure the integrity and consistency of the datasets before analysis.

First, we acquired the CO2 Emissions Per Capita and GDP Per Capita data from Gapminder for the years 1800-2022. We began by performing a consistency check to verify that both datasets included the same countries and years, ensuring uniformity across the data.

Next, we addressed missing values by identifying and either imputing them or excluding entries with significant gaps to maintain dataset reliability. Data type conversion was essential, so we converted all relevant data to appropriate formats, such as numeric values for year, GDP, and CO2 emissions. We then merged the datasets using an inner join on the common keys, which were country and year, to create a unified dataset containing only entries present in both original datasets.

Lastly, we filtered the dataset to focus on the years 1950 to 2021. These cleaning steps ensured a high-quality, comprehensive dataset ready for thorough analysis of the relationship between CO2 emissions per capita and GDP per capita across different countries over the specified period.

Data Cleaning for Missing Values

need to do

Data Import and Cleaning

Data Visualizations

1   The relationship between the two quantitative variables you are investigating.
(response var on x axis and explanatory on y axis)

2   How this relationship (from #1) has changed over time. (can use an animated 

Linear Regression

•   Describe the statistical method used – linear regression.

Call:
lm(formula = average_co2 ~ average_gdp, data = data_regression)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5427 -0.8304 -0.2768  0.2733  7.7137 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -8.204e-01  2.541e-01  -3.228  0.00149 ** 
average_gdp  4.982e-04  4.721e-05  10.554  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.47 on 172 degrees of freedom
Multiple R-squared:  0.3931,    Adjusted R-squared:  0.3895 
F-statistic: 111.4 on 1 and 172 DF,  p-value: < 2.2e-16
•   Present the estimated regression model (in notation).

CO2 Per Capita = -0.8204 + 0.0004982(GDP per Capita)

•   Interpret the linear regression coefficients (in context).

For every one dollar increase in international dollars in GDP per capita, there is a 0.0004982 million ton increase in C02 emissions per capita.

•   Describe the fit of the regression model (both in table and written format).
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.393         0.390  1.47      111. 2.16e-20     1  -313.  632.  641.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

39.31% of the variation in Co2 emissions is explained by variation in GDP per capita in our linear model.

Model Fit

Make a nicely formatted table that presents the following: • The variance in the response values. • The variance in the fitted values from your regression model. • The variance in the residuals from your regression model.

References

https://www.sciencedirect.com/science/article/pii/S235248471500013X?via%3Dihub#screen-reader-main-title

https://www.iea.org/commentaries/the-relationship-between-growth-in-gdp-and-co2-has-loosened-it-needs-to-be-cut-completely